class: center, middle, inverse, title-slide # Databases ### Kirill & Nicolas ### cynkra GmbH ### March 15, 2022 --- <style type="text/css"> .pull-left { margin-top: -25px; } .pull-right { margin-top: -25px; } .remark-code { font-size: 14px; } .font17 { font-size: 17px; } .font14 { font-size: 14px; } </style> --- # Schedule Extract ⇨ Transform ⇨ Load ⇨ Consume .pull-left[ ## One table - Read whole tables - Let the database do the heavy lifting - Basic ETL for one table - Subtle issues to watch out for ] .pull-right[ ## Multiple tables - **Joins** - The {dm} package - A bit of theory - Playing the whole game ] ## Bonus: Bring your own data + questions --- background-image: url("data:image/png;base64,#images/21.webp") background-size: 40% background-position: 100% 100% # Joins .pull-left[ - Usage - Join sources - Mounting Script: `databases_21.R` ```r library(tidyverse) ``` ] --- background-image: url("data:image/png;base64,#images/21-frame.webp") background-size: 40% background-position: 100% 100% # Joins .pull-left[ - Usage - Join sources - Mounting Script: `databases_21.R` ```r library(tidyverse) ``` ] --- # A second table .pull-left[ ```r academy <- tbl(con_duckdb, "academy") academy_sqlite <- tbl(con_sqlite, "academy") academy %>% count(status) ``` <PRE class="fansi fansi-output"><CODE><span style='color: #949494;'># Source: lazy query [?? x 2]</span> <span style='color: #949494;'># Database: duckdb_connection</span> <span style='font-weight: bold;'>status</span> <span style='font-weight: bold;'>n</span> <span style='color: #949494; font-style: italic;'><chr></span> <span style='color: #949494; font-style: italic;'><dbl></span> <span style='color: #BCBCBC;'>1</span> Award not yet introduced 3 <span style='color: #BCBCBC;'>2</span> Nominated 36 <span style='color: #BCBCBC;'>3</span> Ineligible 23 <span style='color: #BCBCBC;'>4</span> Won Special Achievement 1 <span style='color: #BCBCBC;'>5</span> Won 17 </CODE></PRE> ] .pull-right[ ```r academy ``` <PRE class="fansi fansi-output"><CODE><span style='color: #949494;'># Source: table<academy> [?? x 3]</span> <span style='color: #949494;'># Database: duckdb_connection</span> <span style='font-weight: bold;'>film</span> <span style='font-weight: bold;'>award_type</span> <span style='font-weight: bold;'>status</span> <span style='color: #949494; font-style: italic;'><chr></span> <span style='color: #949494; font-style: italic;'><chr></span> <span style='color: #949494; font-style: italic;'><chr></span> <span style='color: #BCBCBC;'> 1</span> Toy Story Animated Feature Award not yet introduced <span style='color: #BCBCBC;'> 2</span> Toy Story Original Screenplay Nominated <span style='color: #BCBCBC;'> 3</span> Toy Story Adapted Screenplay Ineligible <span style='color: #BCBCBC;'> 4</span> Toy Story Original Score Nominated <span style='color: #BCBCBC;'> 5</span> Toy Story Original Song Nominated <span style='color: #BCBCBC;'> 6</span> Toy Story Other Won Special Achievement <span style='color: #BCBCBC;'> 7</span> A Bug's Life Animated Feature Award not yet introduced <span style='color: #BCBCBC;'> 8</span> A Bug's Life Adapted Screenplay Ineligible <span style='color: #BCBCBC;'> 9</span> A Bug's Life Original Score Nominated <span style='color: #BCBCBC;'>10</span> Toy Story 2 Animated Feature Award not yet introduced <span style='color: #949494;'># … with more rows</span> </CODE></PRE> ] --- # Left join The most frequent kind of join. .pull-left[ ## Unsafe ```r academy %>% left_join(pixar_films) ``` <PRE class="fansi fansi-message"><CODE>Joining, by = "film" </CODE></PRE><PRE class="fansi fansi-output"><CODE><span style='color: #949494;'># Source: lazy query [?? x 7]</span> <span style='color: #949494;'># Database: duckdb_connection</span> <span style='font-weight: bold;'>film</span> <span style='font-weight: bold;'>award_type</span> <span style='font-weight: bold;'>status</span> <span style='font-weight: bold;'>number</span> <span style='font-weight: bold;'>release_d…</span> <span style='font-weight: bold;'>run_t…</span> <span style='font-weight: bold;'>film_…</span> <span style='color: #949494; font-style: italic;'><chr></span> <span style='color: #949494; font-style: italic;'><chr></span> <span style='color: #949494; font-style: italic;'><chr></span> <span style='color: #949494; font-style: italic;'><chr></span> <span style='color: #949494; font-style: italic;'><date></span> <span style='color: #949494; font-style: italic;'><dbl></span> <span style='color: #949494; font-style: italic;'><chr></span> <span style='color: #BCBCBC;'> 1</span> Toy Story Animated Fea… Award… 1 1995-11-22 81 G <span style='color: #BCBCBC;'> 2</span> Toy Story Original Scr… Nomin… 1 1995-11-22 81 G <span style='color: #BCBCBC;'> 3</span> Toy Story Adapted Scre… Ineli… 1 1995-11-22 81 G <span style='color: #BCBCBC;'> 4</span> Toy Story Original Sco… Nomin… 1 1995-11-22 81 G <span style='color: #BCBCBC;'> 5</span> Toy Story Original Song Nomin… 1 1995-11-22 81 G <span style='color: #BCBCBC;'> 6</span> Toy Story Other Won S… 1 1995-11-22 81 G <span style='color: #BCBCBC;'> 7</span> A Bug's Life Animated Fea… Award… 2 1998-11-25 95 G <span style='color: #BCBCBC;'> 8</span> A Bug's Life Adapted Scre… Ineli… 2 1998-11-25 95 G <span style='color: #BCBCBC;'> 9</span> A Bug's Life Original Sco… Nomin… 2 1998-11-25 95 G <span style='color: #BCBCBC;'>10</span> Toy Story 2 Animated Fea… Award… 3 1999-11-24 92 G <span style='color: #949494;'># … with more rows</span> </CODE></PRE> ] .pull-right[ ## Better ```r academy %>% left_join(pixar_films, by = "film") ``` <PRE class="fansi fansi-output"><CODE><span style='color: #949494;'># Source: lazy query [?? x 7]</span> <span style='color: #949494;'># Database: duckdb_connection</span> <span style='font-weight: bold;'>film</span> <span style='font-weight: bold;'>award_type</span> <span style='font-weight: bold;'>status</span> <span style='font-weight: bold;'>number</span> <span style='font-weight: bold;'>release_d…</span> <span style='font-weight: bold;'>run_t…</span> <span style='font-weight: bold;'>film_…</span> <span style='color: #949494; font-style: italic;'><chr></span> <span style='color: #949494; font-style: italic;'><chr></span> <span style='color: #949494; font-style: italic;'><chr></span> <span style='color: #949494; font-style: italic;'><chr></span> <span style='color: #949494; font-style: italic;'><date></span> <span style='color: #949494; font-style: italic;'><dbl></span> <span style='color: #949494; font-style: italic;'><chr></span> <span style='color: #BCBCBC;'> 1</span> Toy Story Animated Fea… Award… 1 1995-11-22 81 G <span style='color: #BCBCBC;'> 2</span> Toy Story Original Scr… Nomin… 1 1995-11-22 81 G <span style='color: #BCBCBC;'> 3</span> Toy Story Adapted Scre… Ineli… 1 1995-11-22 81 G <span style='color: #BCBCBC;'> 4</span> Toy Story Original Sco… Nomin… 1 1995-11-22 81 G <span style='color: #BCBCBC;'> 5</span> Toy Story Original Song Nomin… 1 1995-11-22 81 G <span style='color: #BCBCBC;'> 6</span> Toy Story Other Won S… 1 1995-11-22 81 G <span style='color: #BCBCBC;'> 7</span> A Bug's Life Animated Fea… Award… 2 1998-11-25 95 G <span style='color: #BCBCBC;'> 8</span> A Bug's Life Adapted Scre… Ineli… 2 1998-11-25 95 G <span style='color: #BCBCBC;'> 9</span> A Bug's Life Original Sco… Nomin… 2 1998-11-25 95 G <span style='color: #BCBCBC;'>10</span> Toy Story 2 Animated Fea… Award… 3 1999-11-24 92 G <span style='color: #949494;'># … with more rows</span> </CODE></PRE> ] --- # Left join Computed on the database, original data unchanged. ```r academy %>% left_join(pixar_films, by = "film") %>% show_query() ``` ``` <SQL> SELECT "LHS"."film" AS "film", "award_type", "status", "number", "release_date", "run_time", "film_rating" FROM "academy" AS "LHS" LEFT JOIN "pixar_films" AS "RHS" ON ("LHS"."film" = "RHS"."film") ``` --- # Join with preparation The right-hand side in the join should come from a variable. .pull-left[ ## Prepare RHS ```r academy_won <- academy %>% filter(status == "Won") %>% count(film, name = "n_won") academy_won ``` <PRE class="fansi fansi-output"><CODE><span style='color: #949494;'># Source: lazy query [?? x 2]</span> <span style='color: #949494;'># Database: duckdb_connection</span> <span style='font-weight: bold;'>film</span> <span style='font-weight: bold;'>n_won</span> <span style='color: #949494; font-style: italic;'><chr></span> <span style='color: #949494; font-style: italic;'><dbl></span> <span style='color: #BCBCBC;'> 1</span> Monsters, Inc. 1 <span style='color: #BCBCBC;'> 2</span> Finding Nemo 1 <span style='color: #BCBCBC;'> 3</span> The Incredibles 2 <span style='color: #BCBCBC;'> 4</span> Ratatouille 1 <span style='color: #BCBCBC;'> 5</span> WALL-E 1 <span style='color: #BCBCBC;'> 6</span> Up 2 <span style='color: #BCBCBC;'> 7</span> Toy Story 3 2 <span style='color: #BCBCBC;'> 8</span> Brave 1 <span style='color: #BCBCBC;'> 9</span> Inside Out 1 <span style='color: #BCBCBC;'>10</span> Coco 2 <span style='color: #949494;'># … with more rows</span> </CODE></PRE> ] .pull-right[ ```r pixar_films %>% left_join(academy_won, by = "film") ``` <PRE class="fansi fansi-output"><CODE><span style='color: #949494;'># Source: lazy query [?? x 6]</span> <span style='color: #949494;'># Database: duckdb_connection</span> <span style='font-weight: bold;'>number</span> <span style='font-weight: bold;'>film</span> <span style='font-weight: bold;'>release_date</span> <span style='font-weight: bold;'>run_time</span> <span style='font-weight: bold;'>film_rating</span> <span style='font-weight: bold;'>n_won</span> <span style='color: #949494; font-style: italic;'><chr></span> <span style='color: #949494; font-style: italic;'><chr></span> <span style='color: #949494; font-style: italic;'><date></span> <span style='color: #949494; font-style: italic;'><dbl></span> <span style='color: #949494; font-style: italic;'><chr></span> <span style='color: #949494; font-style: italic;'><dbl></span> <span style='color: #BCBCBC;'> 1</span> 4 Monsters, Inc. 2001-11-02 92 G 1 <span style='color: #BCBCBC;'> 2</span> 5 Finding Nemo 2003-05-30 100 G 1 <span style='color: #BCBCBC;'> 3</span> 6 The Incredibles 2004-11-05 115 PG 2 <span style='color: #BCBCBC;'> 4</span> 8 Ratatouille 2007-06-29 111 G 1 <span style='color: #BCBCBC;'> 5</span> 9 WALL-E 2008-06-27 98 G 1 <span style='color: #BCBCBC;'> 6</span> 10 Up 2009-05-29 96 PG 2 <span style='color: #BCBCBC;'> 7</span> 11 Toy Story 3 2010-06-18 103 G 2 <span style='color: #BCBCBC;'> 8</span> 13 Brave 2012-06-22 93 PG 1 <span style='color: #BCBCBC;'> 9</span> 15 Inside Out 2015-06-19 95 PG 1 <span style='color: #BCBCBC;'>10</span> 19 Coco 2017-11-22 105 PG 2 <span style='color: #949494;'># … with more rows</span> </CODE></PRE> ] --- # Join with postprocessing .pull-left[ ## Raw result ```r pixar_films %>% left_join(academy_won, by = "film") %>% arrange(release_date) ``` <PRE class="fansi fansi-output"><CODE><span style='color: #949494;'># Source: lazy query [?? x 6]</span> <span style='color: #949494;'># Database: duckdb_connection</span> <span style='color: #949494;'># Ordered by: release_date</span> <span style='font-weight: bold;'>number</span> <span style='font-weight: bold;'>film</span> <span style='font-weight: bold;'>release_date</span> <span style='font-weight: bold;'>run_time</span> <span style='font-weight: bold;'>film_rating</span> <span style='font-weight: bold;'>n_won</span> <span style='color: #949494; font-style: italic;'><chr></span> <span style='color: #949494; font-style: italic;'><chr></span> <span style='color: #949494; font-style: italic;'><date></span> <span style='color: #949494; font-style: italic;'><dbl></span> <span style='color: #949494; font-style: italic;'><chr></span> <span style='color: #949494; font-style: italic;'><dbl></span> <span style='color: #BCBCBC;'> 1</span> 1 Toy Story 1995-11-22 81 G <span style='color: #BB0000;'>NA</span> <span style='color: #BCBCBC;'> 2</span> 2 A Bug's Life 1998-11-25 95 G <span style='color: #BB0000;'>NA</span> <span style='color: #BCBCBC;'> 3</span> 3 Toy Story 2 1999-11-24 92 G <span style='color: #BB0000;'>NA</span> <span style='color: #BCBCBC;'> 4</span> 4 Monsters, Inc. 2001-11-02 92 G 1 <span style='color: #BCBCBC;'> 5</span> 5 Finding Nemo 2003-05-30 100 G 1 <span style='color: #BCBCBC;'> 6</span> 6 The Incredibles 2004-11-05 115 PG 2 <span style='color: #BCBCBC;'> 7</span> 7 Cars 2006-06-09 117 G <span style='color: #BB0000;'>NA</span> <span style='color: #BCBCBC;'> 8</span> 8 Ratatouille 2007-06-29 111 G 1 <span style='color: #BCBCBC;'> 9</span> 9 WALL-E 2008-06-27 98 G 1 <span style='color: #BCBCBC;'>10</span> 10 Up 2009-05-29 96 PG 2 <span style='color: #949494;'># … with more rows</span> </CODE></PRE> ] .pull-right[ ## After postprocessing ```r pixar_films %>% left_join(academy_won, by = "film") %>% mutate(n_won = coalesce(n_won, 0L)) %>% arrange(release_date) ``` <PRE class="fansi fansi-output"><CODE><span style='color: #949494;'># Source: lazy query [?? x 6]</span> <span style='color: #949494;'># Database: duckdb_connection</span> <span style='color: #949494;'># Ordered by: release_date</span> <span style='font-weight: bold;'>number</span> <span style='font-weight: bold;'>film</span> <span style='font-weight: bold;'>release_date</span> <span style='font-weight: bold;'>run_time</span> <span style='font-weight: bold;'>film_rating</span> <span style='font-weight: bold;'>n_won</span> <span style='color: #949494; font-style: italic;'><chr></span> <span style='color: #949494; font-style: italic;'><chr></span> <span style='color: #949494; font-style: italic;'><date></span> <span style='color: #949494; font-style: italic;'><dbl></span> <span style='color: #949494; font-style: italic;'><chr></span> <span style='color: #949494; font-style: italic;'><dbl></span> <span style='color: #BCBCBC;'> 1</span> 1 Toy Story 1995-11-22 81 G 0 <span style='color: #BCBCBC;'> 2</span> 2 A Bug's Life 1998-11-25 95 G 0 <span style='color: #BCBCBC;'> 3</span> 3 Toy Story 2 1999-11-24 92 G 0 <span style='color: #BCBCBC;'> 4</span> 4 Monsters, Inc. 2001-11-02 92 G 1 <span style='color: #BCBCBC;'> 5</span> 5 Finding Nemo 2003-05-30 100 G 1 <span style='color: #BCBCBC;'> 6</span> 6 The Incredibles 2004-11-05 115 PG 2 <span style='color: #BCBCBC;'> 7</span> 7 Cars 2006-06-09 117 G 0 <span style='color: #BCBCBC;'> 8</span> 8 Ratatouille 2007-06-29 111 G 1 <span style='color: #BCBCBC;'> 9</span> 9 WALL-E 2008-06-27 98 G 1 <span style='color: #BCBCBC;'>10</span> 10 Up 2009-05-29 96 PG 2 <span style='color: #949494;'># … with more rows</span> </CODE></PRE> ] --- # Join with processing Computed on the database, original data unchanged. ```r pixar_films %>% left_join(academy_won, by = "film") %>% mutate(n_won = coalesce(n_won, 0L)) %>% arrange(release_date) %>% show_query() ``` ``` <SQL> SELECT "number", "film", "release_date", "run_time", "film_rating", COALESCE("n_won", 0) AS "n_won" FROM (SELECT "number", "LHS"."film" AS "film", "release_date", "run_time", "film_rating", "n_won" FROM "pixar_films" AS "LHS" LEFT JOIN (SELECT "film", COUNT(*) AS "n_won" FROM "academy" WHERE ("status" = 'Won') GROUP BY "film") "RHS" ON ("LHS"."film" = "RHS"."film") ) "q01" ORDER BY "release_date" ``` --- # Tables must be on the same source Use `copy = TRUE` to enforce, the result is a lazy table if the LHS is a lazy table. .pull-left[ ## Bad ```r try( academy %>% left_join(pixar_films_sqlite, by = "film") ) ``` <PRE class="fansi fansi-output"><CODE>Error in auto_copy(x, y, copy = copy, indexes = if (auto_index) list(by$y)) : `x` and `y` must share the same src. <span style='color: #00BBBB;'>ℹ</span> set `copy` = TRUE (may be slow). </CODE></PRE> ] .pull-right[ ## Not too bad ```r academy %>% left_join(pixar_films_sqlite, by = "film", copy = TRUE) ``` <PRE class="fansi fansi-output"><CODE><span style='color: #949494;'># Source: lazy query [?? x 7]</span> <span style='color: #949494;'># Database: duckdb_connection</span> <span style='font-weight: bold;'>film</span> <span style='font-weight: bold;'>award_type</span> <span style='font-weight: bold;'>status</span> <span style='font-weight: bold;'>number</span> <span style='font-weight: bold;'>release_d…</span> <span style='font-weight: bold;'>run_t…</span> <span style='font-weight: bold;'>film_…</span> <span style='color: #949494; font-style: italic;'><chr></span> <span style='color: #949494; font-style: italic;'><chr></span> <span style='color: #949494; font-style: italic;'><chr></span> <span style='color: #949494; font-style: italic;'><chr></span> <span style='color: #949494; font-style: italic;'><date></span> <span style='color: #949494; font-style: italic;'><dbl></span> <span style='color: #949494; font-style: italic;'><chr></span> <span style='color: #BCBCBC;'> 1</span> Toy Story Animated Fea… Award… 1 1995-11-22 81 G <span style='color: #BCBCBC;'> 2</span> Toy Story Original Scr… Nomin… 1 1995-11-22 81 G <span style='color: #BCBCBC;'> 3</span> Toy Story Adapted Scre… Ineli… 1 1995-11-22 81 G <span style='color: #BCBCBC;'> 4</span> Toy Story Original Sco… Nomin… 1 1995-11-22 81 G <span style='color: #BCBCBC;'> 5</span> Toy Story Original Song Nomin… 1 1995-11-22 81 G <span style='color: #BCBCBC;'> 6</span> Toy Story Other Won S… 1 1995-11-22 81 G <span style='color: #BCBCBC;'> 7</span> A Bug's Life Animated Fea… Award… 2 1998-11-25 95 G <span style='color: #BCBCBC;'> 8</span> A Bug's Life Adapted Scre… Ineli… 2 1998-11-25 95 G <span style='color: #BCBCBC;'> 9</span> A Bug's Life Original Sco… Nomin… 2 1998-11-25 95 G <span style='color: #BCBCBC;'>10</span> Toy Story 2 Animated Fea… Award… 3 1999-11-24 92 G <span style='color: #949494;'># … with more rows</span> </CODE></PRE> ] --- # Copying is expensive! A temporary table is created on the LHS database. If the RHS comes from a different database, results are temporarily loaded into the local session! ```r academy %>% left_join(pixar_films_sqlite, by = "film", copy = TRUE) %>% show_query() ``` ``` <SQL> SELECT "LHS"."film" AS "film", "award_type", "status", "number", "release_date", "run_time", "film_rating" FROM "academy" AS "LHS" LEFT JOIN "dbplyr_002" AS "RHS" ON ("LHS"."film" = "RHS"."film") ``` --- # Joining data frames with lazy tables The result is a data frame too. .pull-left[ ## Bad ```r try( pixarfilms::academy %>% left_join(pixar_films, by = "film") ) ``` <PRE class="fansi fansi-output"><CODE>Error in auto_copy(x, y, copy = copy) : `x` and `y` must share the same src. <span style='color: #00BBBB;'>ℹ</span> set `copy` = TRUE (may be slow). </CODE></PRE> ] .pull-right[ ## Could be worse ```r pixarfilms::academy %>% left_join(pixar_films, by = "film", copy = TRUE) ``` <PRE class="fansi fansi-output"><CODE><span style='color: #949494;'># A tibble: 80 × 7</span> <span style='font-weight: bold;'>film</span> <span style='font-weight: bold;'>award_type</span> <span style='font-weight: bold;'>status</span> <span style='font-weight: bold;'>number</span> <span style='font-weight: bold;'>release_d…</span> <span style='font-weight: bold;'>run_t…</span> <span style='font-weight: bold;'>film_…</span> <span style='color: #949494; font-style: italic;'><chr></span> <span style='color: #949494; font-style: italic;'><chr></span> <span style='color: #949494; font-style: italic;'><chr></span> <span style='color: #949494; font-style: italic;'><chr></span> <span style='color: #949494; font-style: italic;'><date></span> <span style='color: #949494; font-style: italic;'><dbl></span> <span style='color: #949494; font-style: italic;'><chr></span> <span style='color: #BCBCBC;'> 1</span> Toy Story Animated Fea… Award… 1 1995-11-22 81 G <span style='color: #BCBCBC;'> 2</span> Toy Story Original Scr… Nomin… 1 1995-11-22 81 G <span style='color: #BCBCBC;'> 3</span> Toy Story Adapted Scre… Ineli… 1 1995-11-22 81 G <span style='color: #BCBCBC;'> 4</span> Toy Story Original Sco… Nomin… 1 1995-11-22 81 G <span style='color: #BCBCBC;'> 5</span> Toy Story Original Song Nomin… 1 1995-11-22 81 G <span style='color: #BCBCBC;'> 6</span> Toy Story Other Won S… 1 1995-11-22 81 G <span style='color: #BCBCBC;'> 7</span> A Bug's Life Animated Fea… Award… 2 1998-11-25 95 G <span style='color: #BCBCBC;'> 8</span> A Bug's Life Adapted Scre… Ineli… 2 1998-11-25 95 G <span style='color: #BCBCBC;'> 9</span> A Bug's Life Original Sco… Nomin… 2 1998-11-25 95 G <span style='color: #BCBCBC;'>10</span> Toy Story 2 Animated Fea… Award… 3 1999-11-24 92 G <span style='color: #949494;'># … with 70 more rows</span> </CODE></PRE> ] --- # DuckDB: register data frames as database tables Temporarily use a local data frame as a table. Also works for Arrow datasets via `duckdb::duckdb_register_arrow()`. .pull-left[ ## Register and access ```r duckdb::duckdb_register( con_duckdb, "academy_small", pixarfilms::academy[1:3, ] ) academy_small <- tbl(con_duckdb, "academy_small") academy_small ``` <PRE class="fansi fansi-output"><CODE><span style='color: #949494;'># Source: table<academy_small> [?? x 3]</span> <span style='color: #949494;'># Database: duckdb_connection</span> <span style='font-weight: bold;'>film</span> <span style='font-weight: bold;'>award_type</span> <span style='font-weight: bold;'>status</span> <span style='color: #949494; font-style: italic;'><chr></span> <span style='color: #949494; font-style: italic;'><chr></span> <span style='color: #949494; font-style: italic;'><chr></span> <span style='color: #BCBCBC;'>1</span> Toy Story Animated Feature Award not yet introduced <span style='color: #BCBCBC;'>2</span> Toy Story Original Screenplay Nominated <span style='color: #BCBCBC;'>3</span> Toy Story Adapted Screenplay Ineligible </CODE></PRE> ] .pull-right[ ## Use ```r academy_small %>% left_join(pixar_films, by = "film") ``` <PRE class="fansi fansi-output"><CODE><span style='color: #949494;'># Source: lazy query [?? x 7]</span> <span style='color: #949494;'># Database: duckdb_connection</span> <span style='font-weight: bold;'>film</span> <span style='font-weight: bold;'>award_type</span> <span style='font-weight: bold;'>status</span> <span style='font-weight: bold;'>number</span> <span style='font-weight: bold;'>release_d…</span> <span style='font-weight: bold;'>run_t…</span> <span style='font-weight: bold;'>film_…</span> <span style='color: #949494; font-style: italic;'><chr></span> <span style='color: #949494; font-style: italic;'><chr></span> <span style='color: #949494; font-style: italic;'><chr></span> <span style='color: #949494; font-style: italic;'><chr></span> <span style='color: #949494; font-style: italic;'><date></span> <span style='color: #949494; font-style: italic;'><dbl></span> <span style='color: #949494; font-style: italic;'><chr></span> <span style='color: #BCBCBC;'>1</span> Toy Story Adapted Screenpl… Ineli… 1 1995-11-22 81 G <span style='color: #BCBCBC;'>2</span> Toy Story Original Screenp… Nomin… 1 1995-11-22 81 G <span style='color: #BCBCBC;'>3</span> Toy Story Animated Feature Award… 1 1995-11-22 81 G </CODE></PRE> ] --- # DuckDB: Performance comparison Baseline: Data frames. .pull-left[ ```r nrow(nycflights13::flights) ``` ``` [1] 336776 ``` ] .pull-right[ ```r system.time( nycflights13::flights %>% count(year, month, day) ) ``` ``` user system elapsed 0.01 0.00 0.01 ``` ] --- # DuckDB: Performance comparison With registration. .pull-left[ ```r system.time(duckdb::duckdb_register( con_duckdb, "flights", nycflights13::flights )) ``` ``` user system elapsed 0.003 0.001 0.003 ``` ```r flights_register <- tbl(con_duckdb, "flights") flights_register %>% count() ``` <PRE class="fansi fansi-output"><CODE><span style='color: #949494;'># Source: lazy query [?? x 1]</span> <span style='color: #949494;'># Database: duckdb_connection</span> <span style='font-weight: bold;'>n</span> <span style='color: #949494; font-style: italic;'><dbl></span> <span style='color: #BCBCBC;'>1</span> <span style='text-decoration: underline;'>336</span>776 </CODE></PRE> ] .pull-right[ ```r system.time( flights_register %>% count(year, month, day) %>% collect() ) ``` ``` user system elapsed 0.031 0.000 0.030 ``` ] --- # DuckDB: Performance comparison With copy. .pull-left[ ```r system.time( flights_copy <- copy_to(con_duckdb, nycflights13::flights) ) ``` ``` user system elapsed 0.087 0.008 0.095 ``` ```r flights_copy %>% count() ``` <PRE class="fansi fansi-output"><CODE><span style='color: #949494;'># Source: lazy query [?? x 1]</span> <span style='color: #949494;'># Database: duckdb_connection</span> <span style='font-weight: bold;'>n</span> <span style='color: #949494; font-style: italic;'><dbl></span> <span style='color: #BCBCBC;'>1</span> <span style='text-decoration: underline;'>336</span>776 </CODE></PRE> ] .pull-right[ ```r system.time( flights_copy %>% count(year, month, day) %>% collect() ) ``` ``` user system elapsed 0.008 0.000 0.006 ``` ] --- # ETL, revisited Insert a second table into our database. ```r db_path <- fs::path_abs("pixar.duckdb") con <- DBI::dbConnect(duckdb::duckdb(dbdir = db_path)) DBI::dbWriteTable(con, "academy", pixarfilms::academy, overwrite = TRUE) DBI::dbExecute(con, "CREATE UNIQUE INDEX academy_pk ON academy (film, award_type)") ``` ``` [1] 0 ``` ```r DBI::dbExecute(con, "CREATE INDEX academy_fk ON academy (film)") ``` ``` [1] 0 ``` ```r DBI::dbDisconnect(con) ``` --- background-image: url("data:image/png;base64,#images/21-frame.webp") background-size: 40% background-position: 100% 100% # Joins: Exercises 1 .pull-left[ 1. How many rows does the join between `academy` and `pixar_films` contain? Try to find out without loading all the data into memory. Explain. 2. Which films are not yet listed in the `academy` table? What does the resulting SQL query look like? - Hint: Use `anti_join()` ] --- background-image: url("data:image/png;base64,#images/21-frame.webp") background-size: 40% background-position: 100% 100% # Joins: Exercises 2 .pull-left[ 3. Transform `academy` into a wide table so that there is at most one row per film. Join the resulting table with the `pixar_films` table. - Hint: Use `pivot_wider()`, `spread()`, `dcast()`, ... . You need to compute locally, because these functions don't work on the database. ] .pull-right[ 4. Plot a bar chart with the number of awards won and nominated per year. Compute as much as possible on the database. - Hint: "Long form" or "wide form"? ] --- # Schedule Extract ⇨ Transform ⇨ Load ⇨ Consume .pull-left[ ## One table - Read whole tables - Let the database do the heavy lifting - Basic ETL for one table - Subtle issues to watch out for ] .pull-right[ ## Multiple tables - Joins - **The {dm} package** - A bit of theory - Playing the whole game ] ## Bonus: Bring your own data + questions --- background-image: url("data:image/png;base64,#images/22.webp") background-size: 40% background-position: 100% 100% # Data models .pull-left[ - Compound object for multiple tables - Features Script: `databases_22.R` ```r library(tidyverse) library(dm) ``` ] --- background-image: url("data:image/png;base64,#images/22-frame.webp") background-size: 40% background-position: 100% 100% # Data models .pull-left[ - Compound object for multiple tables - Features Script: `databases_22.R` ```r library(tidyverse) library(dm) ``` ] --- # Data model objects .pull-left[ Store multiple tables in an object. ```r pixar_dm <- dm_pixarfilms() pixar_dm ``` <PRE class="fansi fansi-output"><CODE><span style='color: #FFAFFF;'>──</span> <span style='color: #FFAFFF;'>Metadata</span> <span style='color: #FFAFFF;'>────────────────────────────────────────────────────────</span> Tables: `pixar_films`, `pixar_people`, `academy`, `box_office`, `genres`, `public_response` Columns: 23 Primary keys: 5 Foreign keys: 5 </CODE></PRE> ```r pixar_dm %>% dm_draw() ``` ] .pull-right[ <!-- Generated by graphviz version 2.40.1 (20161225.0304) --> <!-- Title: %0 Pages: 1 --> <svg width="315pt" height="330pt" viewBox="-100.00 0.00 315.00 330.00" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"> <g id="graph0" class="graph" transform="scale(1 1) rotate(0) translate(4 326)"> <title>%0</title> <g id="a_graph0"><a xlink:title="Data Model"> <polygon fill="#ffffff" stroke="transparent" points="-4,4 -4,-326 211,-326 211,4 -4,4"/> </a> </g> <!-- academy --> <g id="node1" class="node"> <title>academy</title> <polygon fill="#ed7d31" stroke="transparent" points="1.5,-301 1.5,-321 100.5,-321 100.5,-301 1.5,-301"/> <text text-anchor="start" x="26.1255" y="-306.4" font-family="Times,serif" font-size="14.00" fill="#ffffff">academy</text> <polygon fill="#fbe5d5" stroke="transparent" points="1.5,-281 1.5,-301 100.5,-301 100.5,-281 1.5,-281"/> <text text-anchor="start" x="3.5" y="-286.4" font-family="Times,serif" font-size="14.00" fill="#444444">film</text> <polygon fill="#fbe5d5" stroke="transparent" points="1.5,-261 1.5,-281 100.5,-281 100.5,-261 1.5,-261"/> <text text-anchor="start" x="3.183" y="-267.4" font-family="Times,serif" text-decoration="underline" font-size="14.00" fill="#444444">film, award_type</text> <polygon fill="none" stroke="#9e5320" stroke-opacity="0.666667" points="0,-260 0,-322 101,-322 101,-260 0,-260"/> </g> <!-- pixar_films --> <g id="node4" class="node"> <title>pixar_films</title> <polygon fill="#5b9bd5" stroke="transparent" points="138,-161 138,-181 206,-181 206,-161 138,-161"/> <text text-anchor="start" x="139.7286" y="-166.4" font-family="Times,serif" font-size="14.00" fill="#ffffff">pixar_films</text> <polygon fill="#deebf6" stroke="transparent" points="138,-141 138,-161 206,-161 206,-141 138,-141"/> <text text-anchor="start" x="140" y="-147.4" font-family="Times,serif" text-decoration="underline" font-size="14.00" fill="#444444">film</text> <polygon fill="none" stroke="#3c678e" stroke-opacity="0.666667" points="137,-140 137,-182 207,-182 207,-140 137,-140"/> </g> <!-- academy->pixar_films --> <g id="edge2" class="edge"> <title>academy:film->pixar_films:film</title> <path fill="none" stroke="#555555" d="M100.5,-291C161.0155,-291 84.073,-167.4399 127.9432,-152.4775"/> <polygon fill="#555555" stroke="#555555" points="128.615,-155.9165 138,-151 127.5974,-148.9909 128.615,-155.9165"/> </g> <!-- box_office --> <g id="node2" class="node"> <title>box_office</title> <polygon fill="#ed7d31" stroke="transparent" points="18.5,-221 18.5,-241 82.5,-241 82.5,-221 18.5,-221"/> <text text-anchor="start" x="20.1795" y="-226.4" font-family="Times,serif" font-size="14.00" fill="#ffffff">box_office</text> <polygon fill="#fbe5d5" stroke="transparent" points="18.5,-201 18.5,-221 82.5,-221 82.5,-201 18.5,-201"/> <text text-anchor="start" x="20.5" y="-207.4" font-family="Times,serif" text-decoration="underline" font-size="14.00" fill="#444444">film</text> <polygon fill="none" stroke="#9e5320" stroke-opacity="0.666667" points="17.5,-200 17.5,-242 83.5,-242 83.5,-200 17.5,-200"/> </g> <!-- box_office->pixar_films --> <g id="edge3" class="edge"> <title>box_office:film->pixar_films:film</title> <path fill="none" stroke="#555555" d="M82.5,-211C114.9945,-211 104.6973,-162.9888 128.1654,-152.8615"/> <polygon fill="#555555" stroke="#555555" points="128.8254,-156.2988 138,-151 127.5235,-149.4209 128.8254,-156.2988"/> </g> <!-- genres --> <g id="node3" class="node"> <title>genres</title> <polygon fill="#ed7d31" stroke="transparent" points="18.5,-161 18.5,-181 83.5,-181 83.5,-161 18.5,-161"/> <text text-anchor="start" x="32.7328" y="-166.4" font-family="Times,serif" font-size="14.00" fill="#ffffff">genres</text> <polygon fill="#fbe5d5" stroke="transparent" points="18.5,-141 18.5,-161 83.5,-161 83.5,-141 18.5,-141"/> <text text-anchor="start" x="20.5" y="-146.4" font-family="Times,serif" font-size="14.00" fill="#444444">film</text> <polygon fill="#fbe5d5" stroke="transparent" points="18.5,-121 18.5,-141 83.5,-141 83.5,-121 18.5,-121"/> <text text-anchor="start" x="20.2889" y="-127.4" font-family="Times,serif" text-decoration="underline" font-size="14.00" fill="#444444">film, genre</text> <polygon fill="none" stroke="#9e5320" stroke-opacity="0.666667" points="17,-120 17,-182 84,-182 84,-120 17,-120"/> </g> <!-- genres->pixar_films --> <g id="edge4" class="edge"> <title>genres:film->pixar_films:film</title> <path fill="none" stroke="#555555" d="M83.5,-151C103.9375,-151 111.4419,-151 127.8378,-151"/> <polygon fill="#555555" stroke="#555555" points="128,-154.5001 138,-151 128,-147.5001 128,-154.5001"/> </g> <!-- pixar_people --> <g id="node5" class="node"> <title>pixar_people</title> <polygon fill="#70ad47" stroke="transparent" points="12.5,-81 12.5,-101 89.5,-101 89.5,-81 12.5,-81"/> <text text-anchor="start" x="14.4572" y="-86.4" font-family="Times,serif" font-size="14.00" fill="#ffffff">pixar_people</text> <polygon fill="#e2eeda" stroke="transparent" points="12.5,-61 12.5,-81 89.5,-81 89.5,-61 12.5,-61"/> <text text-anchor="start" x="14.5" y="-66.4" font-family="Times,serif" font-size="14.00" fill="#444444">film</text> <polygon fill="none" stroke="#4a732f" stroke-opacity="0.666667" points="11,-60 11,-102 90,-102 90,-60 11,-60"/> </g> <!-- pixar_people->pixar_films --> <g id="edge1" class="edge"> <title>pixar_people:film->pixar_films:film</title> <path fill="none" stroke="#555555" d="M89.5,-71C127.1813,-71 102.2491,-136.7031 128.1904,-149.0225"/> <polygon fill="#555555" stroke="#555555" points="127.5055,-152.4547 138,-151 128.8889,-145.5928 127.5055,-152.4547"/> </g> <!-- public_response --> <g id="node6" class="node"> <title>public_response</title> <polygon fill="#ed7d31" stroke="transparent" points="3.5,-21 3.5,-41 97.5,-41 97.5,-21 3.5,-21"/> <text text-anchor="start" x="5.0126" y="-26.4" font-family="Times,serif" font-size="14.00" fill="#ffffff">public_response</text> <polygon fill="#fbe5d5" stroke="transparent" points="3.5,-1 3.5,-21 97.5,-21 97.5,-1 3.5,-1"/> <text text-anchor="start" x="5.5" y="-7.4" font-family="Times,serif" text-decoration="underline" font-size="14.00" fill="#444444">film</text> <polygon fill="none" stroke="#9e5320" stroke-opacity="0.666667" points="2.5,0 2.5,-42 98.5,-42 98.5,0 2.5,0"/> </g> <!-- public_response->pixar_films --> <g id="edge5" class="edge"> <title>public_response:film->pixar_films:film</title> <path fill="none" stroke="#555555" d="M97.5,-11C158.3517,-11 83.4456,-134.5601 127.8579,-149.5225"/> <polygon fill="#555555" stroke="#555555" points="127.5999,-153.0217 138,-151 128.6091,-146.0949 127.5999,-153.0217"/> </g> </g> </svg> ] --- # Data model objects Use like a named list. ```r names(pixar_dm) ``` ``` [1] "pixar_films" "pixar_people" "academy" "box_office" "genres" "public_response" ``` .pull-left[ ```r pixar_dm$pixar_films ``` <PRE class="fansi fansi-output"><CODE><span style='color: #949494;'># A tibble: 27 × 5</span> <span style='font-weight: bold;'>number</span> <span style='font-weight: bold;'>film</span> <span style='font-weight: bold;'>release_date</span> <span style='font-weight: bold;'>run_time</span> <span style='font-weight: bold;'>film_rating</span> <span style='color: #949494; font-style: italic;'><chr></span> <span style='color: #949494; font-style: italic;'><chr></span> <span style='color: #949494; font-style: italic;'><date></span> <span style='color: #949494; font-style: italic;'><dbl></span> <span style='color: #949494; font-style: italic;'><chr></span> <span style='color: #BCBCBC;'> 1</span> 1 Toy Story 1995-11-22 81 G <span style='color: #BCBCBC;'> 2</span> 2 A Bug's Life 1998-11-25 95 G <span style='color: #BCBCBC;'> 3</span> 3 Toy Story 2 1999-11-24 92 G <span style='color: #BCBCBC;'> 4</span> 4 Monsters, Inc. 2001-11-02 92 G <span style='color: #BCBCBC;'> 5</span> 5 Finding Nemo 2003-05-30 100 G <span style='color: #BCBCBC;'> 6</span> 6 The Incredibles 2004-11-05 115 PG <span style='color: #BCBCBC;'> 7</span> 7 Cars 2006-06-09 117 G <span style='color: #BCBCBC;'> 8</span> 8 Ratatouille 2007-06-29 111 G <span style='color: #BCBCBC;'> 9</span> 9 WALL-E 2008-06-27 98 G <span style='color: #BCBCBC;'>10</span> 10 Up 2009-05-29 96 PG <span style='color: #949494;'># … with 17 more rows</span> </CODE></PRE> ] .pull-right[ ```r pixar_dm$academy ``` <PRE class="fansi fansi-output"><CODE><span style='color: #949494;'># A tibble: 80 × 3</span> <span style='font-weight: bold;'>film</span> <span style='font-weight: bold;'>award_type</span> <span style='font-weight: bold;'>status</span> <span style='color: #949494; font-style: italic;'><chr></span> <span style='color: #949494; font-style: italic;'><chr></span> <span style='color: #949494; font-style: italic;'><chr></span> <span style='color: #BCBCBC;'> 1</span> Toy Story Animated Feature Award not yet introduced <span style='color: #BCBCBC;'> 2</span> Toy Story Original Screenplay Nominated <span style='color: #BCBCBC;'> 3</span> Toy Story Adapted Screenplay Ineligible <span style='color: #BCBCBC;'> 4</span> Toy Story Original Score Nominated <span style='color: #BCBCBC;'> 5</span> Toy Story Original Song Nominated <span style='color: #BCBCBC;'> 6</span> Toy Story Other Won Special Achievement <span style='color: #BCBCBC;'> 7</span> A Bug's Life Animated Feature Award not yet introduced <span style='color: #BCBCBC;'> 8</span> A Bug's Life Adapted Screenplay Ineligible <span style='color: #BCBCBC;'> 9</span> A Bug's Life Original Score Nominated <span style='color: #BCBCBC;'>10</span> Toy Story 2 Animated Feature Award not yet introduced <span style='color: #949494;'># … with 70 more rows</span> </CODE></PRE> ] --- # Showcase: wrapping all tables in a data model One of the many operations supported by {dm}. .pull-left[ ```r pixar_films_wrapped <- pixar_dm %>% dm_wrap_tbl(pixar_films) %>% pull_tbl(pixar_films) pixar_films_wrapped ``` <PRE class="fansi fansi-output"><CODE><span style='color: #949494;'># A tibble: 27 × 10</span> <span style='font-weight: bold;'>number</span> <span style='font-weight: bold;'>film</span> <span style='font-weight: bold;'>release_d…</span> <span style='font-weight: bold;'>run_t…</span> <span style='font-weight: bold;'>film_…</span> <span style='font-weight: bold;'>pixar_p…</span> <span style='font-weight: bold;'>academy</span> <span style='font-weight: bold;'>box_off…</span> <span style='color: #949494; font-style: italic;'><chr></span> <span style='color: #949494; font-style: italic;'><chr></span> <span style='color: #949494; font-style: italic;'><date></span> <span style='color: #949494; font-style: italic;'><dbl></span> <span style='color: #949494; font-style: italic;'><chr></span> <span style='color: #949494; font-style: italic;'><nested></span> <span style='color: #949494; font-style: italic;'><nested></span> <span style='color: #949494; font-style: italic;'><nested></span> <span style='color: #BCBCBC;'> 1</span> 1 Toy S… 1995-11-22 81 G <tibble> <tibble> <tibble> <span style='color: #BCBCBC;'> 2</span> 2 A Bug… 1998-11-25 95 G <tibble> <tibble> <tibble> <span style='color: #BCBCBC;'> 3</span> 3 Toy S… 1999-11-24 92 G <tibble> <tibble> <tibble> <span style='color: #BCBCBC;'> 4</span> 4 Monst… 2001-11-02 92 G <tibble> <tibble> <tibble> <span style='color: #BCBCBC;'> 5</span> 5 Findi… 2003-05-30 100 G <tibble> <tibble> <tibble> <span style='color: #BCBCBC;'> 6</span> 6 The I… 2004-11-05 115 PG <tibble> <tibble> <tibble> <span style='color: #BCBCBC;'> 7</span> 7 Cars 2006-06-09 117 G <tibble> <tibble> <tibble> <span style='color: #BCBCBC;'> 8</span> 8 Ratat… 2007-06-29 111 G <tibble> <tibble> <tibble> <span style='color: #BCBCBC;'> 9</span> 9 WALL-E 2008-06-27 98 G <tibble> <tibble> <tibble> <span style='color: #BCBCBC;'>10</span> 10 Up 2009-05-29 96 PG <tibble> <tibble> <tibble> <span style='color: #949494;'># … with 17 more rows, and 2 more variables: </span><span style='color: #949494; font-weight: bold;'>genres</span><span style='color: #949494;'> <nested>,</span> <span style='color: #949494;'># </span><span style='color: #949494; font-weight: bold;'>public_response</span><span style='color: #949494;'> <nested></span> </CODE></PRE> ] .pull-right[ ```r pixar_films_wrapped$academy[1:2] ``` <PRE class="fansi fansi-output"><CODE>[[1]] <span style='color: #949494;'># A tibble: 6 × 2</span> <span style='font-weight: bold;'>award_type</span> <span style='font-weight: bold;'>status</span> <span style='color: #949494; font-style: italic;'><chr></span> <span style='color: #949494; font-style: italic;'><chr></span> <span style='color: #BCBCBC;'>1</span> Animated Feature Award not yet introduced <span style='color: #BCBCBC;'>2</span> Original Screenplay Nominated <span style='color: #BCBCBC;'>3</span> Adapted Screenplay Ineligible <span style='color: #BCBCBC;'>4</span> Original Score Nominated <span style='color: #BCBCBC;'>5</span> Original Song Nominated <span style='color: #BCBCBC;'>6</span> Other Won Special Achievement [[2]] <span style='color: #949494;'># A tibble: 3 × 2</span> <span style='font-weight: bold;'>award_type</span> <span style='font-weight: bold;'>status</span> <span style='color: #949494; font-style: italic;'><chr></span> <span style='color: #949494; font-style: italic;'><chr></span> <span style='color: #BCBCBC;'>1</span> Animated Feature Award not yet introduced <span style='color: #BCBCBC;'>2</span> Adapted Screenplay Ineligible <span style='color: #BCBCBC;'>3</span> Original Score Nominated </CODE></PRE> ] --- # Schedule Extract ⇨ Transform ⇨ Load ⇨ Consume .pull-left[ ## One table - Read whole tables - Let the database do the heavy lifting - Basic ETL for one table - Subtle issues to watch out for ] .pull-right[ ## Multiple tables - Joins - The {dm} package - **A bit of theory** - Playing the whole game ] ## Bonus: Bring your own data + questions --- background-image: url("data:image/png;base64,#images/23.webp") background-size: 40% background-position: 100% 100% # Data models .pull-left[ - Keys, relationships, constraints - Zooming Script: `databases_23.R` ```r library(tidyverse) library(dm) ``` ] --- background-image: url("data:image/png;base64,#images/23-frame.webp") background-size: 40% background-position: 100% 100% # Data models .pull-left[ - Keys, relationships, constraints - Zooming Script: `databases_23.R` ```r library(tidyverse) library(dm) ``` ] --- # Schedule Extract ⇨ Transform ⇨ Load ⇨ Consume .pull-left[ ## One table - Read whole tables - Let the database do the heavy lifting - Basic ETL for one table - Subtle issues to watch out for ] .pull-right[ ## Multiple tables - Joins - The {dm} package - A bit of theory - **Playing the whole game** ] ## Bonus: Bring your own data + questions --- background-image: url("data:image/png;base64,#images/24.webp") background-size: 40% background-position: 100% 100% # The whole game .pull-left[ - Build a local data model - Copy it to the database - Consume it Script: `databases_24.R` ```r library(tidyverse) library(dm) ``` ] --- background-image: url("data:image/png;base64,#images/24-frame.webp") background-size: 40% background-position: 100% 100% # The whole game .pull-left[ - Build a local data model - Copy it to the database - Consume it Script: `databases_24.R` ```r library(tidyverse) library(dm) ``` ] --- # Recap <table> <tr> <td rowspan=2> <img src="data:image/png;base64,#images/11-frame.webp" width="200px" /> </td> <td><img src="data:image/png;base64,#images/12-frame.webp" width="200px" /></td> <td><img src="data:image/png;base64,#images/12_2-frame.webp" width="200px" /></td> <td><img src="data:image/png;base64,#images/13-frame.webp" width="200px" /></td> <td><img src="data:image/png;base64,#images/14-frame.webp" width="200px" /></td> </tr> <tr style="background:transparent"> <td><img src="data:image/png;base64,#images/21-frame.webp" width="200px" /></td> <td><img src="data:image/png;base64,#images/22-frame.webp" width="200px" /></td> <td><img src="data:image/png;base64,#images/23-frame.webp" width="200px" /></td> <td><img src="data:image/png;base64,#images/24-frame.webp" width="200px" /></td> </tr> </table> --- # Schedule Extract ⇨ Transform ⇨ Load ⇨ Consume .pull-left[ ## One table - Read whole tables - Let the database do the heavy lifting - Basic ETL for one table - Subtle issues to watch out for ] .pull-right[ ## Multiple tables - Joins - The {dm} package - A bit of theory - Playing the whole game ] ## Bonus: Bring your own data + questions